For this assignment, I did remake to automate data analysis pipelines. - This is the (remake.yml)[] file which have all the targets and dependencies. And I explained every step in the file with comments. - This is the (code.R)[] file which contains all the functions/rules that will relate the targets together. And I explained every step in the file with comments. - This is how it works: in remake.yml, we will call the functions defined in code.R to generate all the targets and automate the pipeline. Refer to those two files to see more detailed explanation.
This is what the processed gapminder data looks like.
knitr::kable(head(read.csv("summary_dat.csv")))
| X | country | continent | year | lifeExp | pop | gdpPercap | newpop | weight | weighted_mean_gdp |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.4453 | 0.8425333 | 0.0001670 | 0.1301948 |
| 2 | Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.8530 | 0.9240934 | 0.0001832 | 0.1503842 |
| 3 | Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.1007 | 1.0267083 | 0.0002035 | 0.1736474 |
| 4 | Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.1971 | 1.1537966 | 0.0002287 | 0.1912753 |
| 5 | Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.9811 | 1.3079460 | 0.0002593 | 0.1918807 |
| 6 | Afghanistan | Asia | 1977 | 38.438 | 14880372 | 786.1134 | 1.4880372 | 0.0002950 | 0.2319102 |
Fig 1: The trend of mean gdp weighted by population over time for different continent
Fig 2: The scatterplot of mean gdp weighted by population vs lifeExp for diffrent continent
remake::diagram() will generate the dependency diagram based on the remake.yml file, which is super convenient. From this diagram viewing backwards, we can see that our final product report.html depends on the two figures and report.md. And the two figures depend on processed_gapminder_data, which depends on gapminder_df, which depends on gapminder.tsv. In other words, we first download gapminder.tsv using the function downl_tsv(). Then we read it in and call it gapminder_df. Then we mutate gapminder_df to have weighted_mean_gdp using the function process_data()and call the mutated dataframe processed_gapminder_data. Then we creat two plots based on the processed_gapminder_data using plot_gdp_year() and plot_gdp_lifeExp() functions, which together with report.md will produce the final report.html.
remake::diagram(remake_file = "remake.yml")